Symmetric Eigenvalue Decomposition - Lanczos Method

If the matrix $A$ is large and sparse and/or if only some eigenvalues and their eigenvectors are desired, iterative methods are the methods of choice. For example, the power method can be useful to compute the eigenvalue with the largest modulus. The basic operation in the power method is matrix-vector multiplication, and this can be performed very fast if $A$ is sparse. Moreover, $A$ need not be stored in the computer -- the input for the algorithm can be just a function which, given some vector $x$, returns the product $Ax$.

An improved version of the power method, which efficiently computes some eigenvalues (either largest in modulus or near some target value $\mu$) and the corresponding eigenvectors, is the Lanczos method.

For more details, see I. Slapničar, Symmetric Matrix Eigenvalue Techniques and the references therein.

Prerequisites

The reader should be familiar with concepts of eigenvalues and eigenvectors, related perturbation theory, and algorithms.

Competences

The reader should be able to recognise matrices which warrant use uf Lanczos method, to apply the method and to assess the accuracy of the solution.

Lanczos method

$A$ is a real symmetric matrix of order $n$.

Definitions

Given a nonzero vector $x$ and an index $k<n$, the Krylov matrix is defined as

$$ K_k=\begin{bmatrix} x & Ax & A^2 x &\cdots & A^{k-1}x \end{bmatrix}. $$

Krilov subspace is the subspace spanned by the columns of $K_k$.

Facts

  1. The Lanczos method is based on the following observation. If $K_k=XR$ is the $QR$ factorization of the matrix $K_k$, then the $k\times k$ matrix $T=X^T A X$ is tridiagonal. The matrices $X$ and $T$ can be computed by using only matrix-vector products in $O(kn)$ operations.

  2. Let $T=Q\Lambda Q^T$ be the EVD of $T$. Then $\lambda_i$ approximate well some of the largest and smallest eigenvalues of $A$, and the columns of the matrix $U=XQ$ approximate the corresponding eigenvectors.

  3. As $k$ increases, the largest (smallest) eigenvalues of the matrix $T_{1:k,1:k}$ converge towards some of the largest (smallest) eigenvalues of $A$ (due to the Cauchy interlace property). The algorithm can be redesigned to compute only largest or smallest eigenvalues. Also, by using shift and invert strategy, the method can be used to compute eigenvalues near some specified value. In order to obtain better approximations, $k$ should be greater than the number of required eigenvalues. On the other side, in order to obtain better accuracy and efficacy, $k$ should be as small as possible.

  4. The last computed element, $\mu=T_{k+1,k}$, provides information about accuracy: \begin{align*} \|AU-U\Lambda\|_2&=\mu, \\ \|AU_{:,i}-\lambda_i U_{:,i}\|_2&=\mu |Q_{ki}|, \quad i=1,\ldots,k. \end{align*} Further, there are $k$ eigenvalues $\tilde\lambda_1,\ldots,\tilde\lambda_k$ of $A$ such that $|\lambda_i-\tilde\lambda_i|\leq \mu$, and for the corresponding eigenvectors, we have $$\sin2\Theta(U_{:,i},\tilde U_{:,i}) \leq \frac{2\mu}{\min_{j\neq i} |\lambda_i-\tilde \lambda_j|}.$$

  5. In practical implementations, $\mu$ is usually used to determine the index $k$.

  6. The Lanczos method has inherent numerical instability in the floating-point arithmetic: since the Krylov vectors are, in fact, generated by the power method, they converge towards an eigenvector of $A$. Thus, as $k$ increases, the Krylov vectors become more and more parallel, and the recursion in the function myLanczos() becomes numerically unstable and the computed columns of $X$ cease to be sufficiently orthogonal. This affects both the convergence and the accuracy of the algorithm. For example, several eigenvalues of $T$ may converge towards a simple eigenvalue of $A$ (the, so called, ghost eigenvalues).

  7. The loss of orthogonality is dealt with by using the full reorthogonalization procedure: in each step, the new ${\bf z}$ is orthogonalized against all previous columns of $X$, that is, in function myLanczos(), the formula

    z=z-Tr.dv[i]*X[:,i]-Tr.ev[i-1]*X[:,i-1]

    is replaced by

    z=z-sum(dot(z,Tr.dv[i])*X[:,i]-Tr.ev[i-1]*X[:,i-1]

    To obtain better orthogonality, the latter formula is usually executed twice. The full reorthogonalization raises the operation count to $O(k^2n)$.

  8. The selective reorthogonalization is the procedure in which the current $z$ is orthogonalized against some selected columns of $X$, in order to attain sufficient numerical stability and not increase the operation count too much. The details are very subtle and can be found in the references.

  9. The Lanczos method is usually used for sparse matrices. Sparse matrix $A$ is stored in the sparse format in which only values and indices of nonzero elements are stored. The number of operations required to multiply some vector by $A$ is also proportional to the number of nonzero elements.

  10. The function eigs() implements Lanczos method real for symmetric matrices and more general Arnoldi method for general matrices.

Examples


In [1]:
using LinearAlgebra

In [16]:
function myLanczos(A::Array{T}, x::Vector{T}, k::Int) where T
    n=size(A,1)
    X=Array{T}(undef,n,k)
    dv=Array{T}(undef,k)
    ev=Array{T}(undef,k-1)
    X[:,1]=x/norm(x)
    for i=1:k-1
        z=A*X[:,i]
        dv[i]=X[:,i]z
        # Three-term recursion
        if i==1
            z=z-dv[i]*X[:,i]
        else
            # z=z-dv[i]*X[:,i]-ev[i-1]*X[:,i-1]
            # Full reorthogonalization - once or even twice
            z=z-sum([(z⋅X[:,j])*X[:,j] for j=1:i])
            # z=z-sum([(z⋅X[:,j])*X[:,j] for j=1:i])
        end
        μ=norm(z)
        if μ==0
            Tr=SymTridiagonal(dv[1:i-1],ev[1:i-2])
            return eigvals(Tr), X[:,1:i-1]*eigvecs(Tr), X[:,1:i-1], μ
        else
            ev[i]=μ
            X[:,i+1]=z/μ
        end
    end
    # Last step
    z=A*X[:,end]
    dv[end]=X[:,end]z
    z=z-dv[end]*X[:,end]-ev[end]*X[:,end-1]
    μ=norm(z)
    Tr=SymTridiagonal(dv,ev)
    eigvals(Tr), X*eigvecs(Tr), X, μ
end


Out[16]:
myLanczos (generic function with 1 method)

In [17]:
using Random
Random.seed!(421)
n=100
A=Matrix(Symmetric(rand(n,n)))
# Or: A = rand(5,5) |> t -> t + t'
x=rand(n)
k=10


Out[17]:
10

In [18]:
λ,U,X,μ=myLanczos(A,x,k)


Out[18]:
([-5.612209785565781, -4.677277965792197, -3.3124896385289544, -2.095142511035646, -0.5718147216543557, 1.3641355656123562, 3.0104143896243403, 4.458283323246534, 5.269226240622586, 50.08025479505324], [0.08539433654429654 -0.2651334050326902 … -0.0023560635962277645 0.10040211367914688; -0.06872323531174868 -0.0439774360720339 … 0.10455179778524816 0.10104871961626137; … ; -0.03964309443848085 -0.054712781784676694 … -0.12386468153358368 0.1111742540516016; 0.00903369587930853 0.0950335178225208 … 0.156378192057155 0.1046724815971024], [0.030929950649917496 0.1567903699202761 … 0.013863274554858604 -0.13276987537595433; 0.006103839203386471 0.19305287580437644 … 0.04725772943786309 0.05921235012170554; … ; 0.14682992679347637 -0.025119224274195936 … 0.043313764209208454 -0.1161584262261947; 0.054314993332550375 0.11146309456792886 … 0.021722017190658124 0.12728020085965652], 2.6002416667358426)

In [19]:
# Orthogonality
norm(X'*X-I)


Out[19]:
5.878159230720555e-15

In [20]:
X'*A*X


Out[20]:
10×10 Array{Float64,2}:
 36.5073       22.5598       -3.37291e-14  …   2.69284e-14  -5.85148e-15
 22.5598       12.4312        2.7439           1.54865e-14  -4.73551e-15
 -3.16394e-14   2.7439        0.520964         6.32247e-16  -1.81875e-15
  3.77015e-15   4.1185e-15    2.76685         -6.77409e-16   3.15142e-16
  2.83296e-14   1.66954e-14  -1.8791e-17       1.38801e-15  -4.86094e-16
  2.60436e-15  -1.1312e-15   -1.33846e-15  …  -1.25249e-15   7.82399e-16
 -2.57269e-14  -1.55997e-14  -3.58742e-16      4.05841e-16  -5.81826e-16
  3.57774e-15   3.72344e-15   1.48707e-15      2.81876      -3.31823e-16
  2.78801e-14   1.69598e-14   7.3814e-16       0.154228      2.78294
 -4.80117e-15  -4.79113e-15  -1.27874e-15      2.78294      -0.592886

In [21]:
# Residual
norm(A*U-U*Diagonal(λ)), μ


Out[21]:
(2.6002416667358417, 2.6002416667358426)

In [22]:
U'*A*U


Out[22]:
10×10 Array{Float64,2}:
 -5.61221      -2.02886e-15  -1.42698e-15  …  -3.53231e-15  -3.14323e-15
 -1.96115e-15  -4.67728       3.06876e-16     -1.81307e-15  -3.88309e-15
 -1.31884e-15   8.904e-17    -3.31249          1.42073e-15  -1.23711e-14
  1.40735e-15  -4.34622e-15  -2.4383e-15       3.14046e-15  -2.97711e-14
  2.44373e-16   1.8483e-16    3.56221e-15      1.53087e-15  -1.89004e-14
 -2.24974e-15   5.45939e-15   3.05922e-15  …  -2.64548e-15   5.42856e-14
  3.25658e-15   3.91339e-15   4.83013e-16     -2.85309e-15   3.36078e-15
 -6.94047e-16   3.71779e-15   1.7264e-15      -1.64092e-14  -1.08789e-14
 -3.55324e-15  -2.03132e-15   1.30215e-15      5.26923      -6.9351e-16
 -1.97503e-15  -1.06203e-15  -7.15204e-15      1.80055e-15  50.0803

In [23]:
# Orthogonality
opnorm(U'*U-I)


Out[23]:
5.868743693127865e-15

In [24]:
# Full eigenvalue decomposition
λeigen,Ueigen=eigen(A);

In [25]:
using Arpack

In [26]:
?eigs


search: eigs λeigs Ueigs eigvecs eigvals eigvals! leading_ones leading_zeros

Out[26]:
eigs(A; nev=6, ncv=max(20,2*nev+1), which=:LM, tol=0.0, maxiter=300, sigma=nothing, ritzvec=true, v0=zeros((0,))) -> (d,[v,],nconv,niter,nmult,resid)

Computes eigenvalues d of A using implicitly restarted Lanczos or Arnoldi iterations for real symmetric or general nonsymmetric matrices respectively. See the manual for more information.

eigs returns the nev requested eigenvalues in d, the corresponding Ritz vectors v (only if ritzvec=true), the number of converged eigenvalues nconv, the number of iterations niter and the number of matrix vector multiplications nmult, as well as the final residual vector resid.

Examples

jldoctest
julia> using Arpack

julia> A = Diagonal(1:4);

julia> λ, ϕ = eigs(A, nev = 2);

julia> λ
2-element Array{Float64,1}:
 4.0
 3.0

eigs(A, B; nev=6, ncv=max(20,2*nev+1), which=:LM, tol=0.0, maxiter=300, sigma=nothing, ritzvec=true, v0=zeros((0,))) -> (d,[v,],nconv,niter,nmult,resid)

Computes generalized eigenvalues d of A and B using implicitly restarted Lanczos or Arnoldi iterations for real symmetric or general nonsymmetric matrices respectively. See the manual for more information.


In [48]:
# Lanczos method implemented in Julia
λeigs,Ueigs=eigs(A; nev=k, which=:LM, ritzvec=true, v0=x)


Out[48]:
([50.08025479505324, -5.708672899038032, -5.529785436841579, 5.3677137092636045, 5.335366315266961, -5.174771613218456, 5.044232434431196, -4.91332429038466, 4.886659498073604, 4.6994010943137665], [0.10040211367923428 0.009463091702392665 … -0.17640737495576178 -0.04475526151734935; 0.1010487196163353 -0.07424639981196725 … 0.07706357513862648 -0.06454175539660488; … ; 0.11117425405100118 0.004751857012553248 … 0.009582776875236863 0.08313988640361171; 0.10467248159703442 0.03260610416809978 … -0.10112311821700376 -0.08947230752630042], 10, 16, 136, [0.16641423615092277, 0.3069991956480348, -0.09279888891981491, 0.06770472174817771, 0.7789302852449385, 0.18192479147828827, 0.1851666565948333, 0.08373926199594331, -0.06392268034073552, 0.17708063652848324  …  0.312912374855199, -0.04848456559699422, 0.10255955734665244, -0.008876973759081204, 0.09533491953488805, 0.05564901129927252, -0.2328835507055555, -0.08733324216513018, -0.37362424559025365, 0.3029270662053501])

In [28]:
[λ λeigs λeigen[sortperm(abs.(λeigen),rev=true)[1:k]] ]


Out[28]:
10×3 Array{Float64,2}:
 -5.61221   50.0803   50.0803
 -4.67728   -5.70867  -5.70867
 -3.31249   -5.52979  -5.52979
 -2.09514    5.36771   5.36771
 -0.571815   5.33537   5.33537
  1.36414   -5.17477  -5.17477
  3.01041    5.04423   5.04423
  4.45828   -4.91332  -4.91332
  5.26923    4.88666   4.88666
 50.0803     4.6994    4.6994

We see that eigs() computes k eigenvalues with largest modulus. What eigenvalues did myLanczos() compute?


In [29]:
for i=1:k
    println(minimum(abs,λeigen.-λ[i]))
end


0.08242434872420734
0.05337415718834748
0.01005749176668358
0.0035411656264545677
0.0088728191278582
0.012090555075940479
0.10762601320997645
0.04919537417827158
0.06614007464435012
0.0

Conslusion is that the naive implementation of Lanczos is not enough. However, it is fine, when all eigenvalues are computed. Why?


In [30]:
λall,Uall,Xall,μall=myLanczos(A,x,100)


Out[30]:
([-5.708672899038037, -5.529785436841564, -5.174771613218457, -4.9133242903846694, -4.623903808603852, -4.453058147533066, -4.321794529434708, -4.239075211024571, -4.037726839690757, -3.9632230728402336  …  4.173113331966948, 4.33831899156518, 4.409087949068266, 4.570065350557669, 4.699401094313775, 4.886659498073603, 5.044232434431188, 5.33536631526694, 5.3677137092636125, 50.080254795053236], [0.009463091702392732 -0.11197602291301287 … 0.033381583751849554 0.10040211367923413; -0.07424639981196732 -0.1125275658824402 … -0.011280957419505676 0.10104871961633517; … ; 0.00475185701255292 0.14419179045292088 … -0.004464551785012557 0.11117425405100123; 0.03260610416810023 -0.1618127337269178 … -0.0136039316957444 0.10467248159703424], [0.030929950649917496 0.1567903699202761 … -0.13014166647608172 0.08953216869068598; 0.006103839203386471 0.19305287580437644 … 0.05595002120421078 0.047392306572044886; … ; 0.14682992679347637 -0.025119224274195936 … 0.13138875021692706 -0.08929478042989335; 0.054314993332550375 0.11146309456792886 … 0.10695944457385113 0.13558249071086703], 1.9780138077750376e-13)

In [31]:
# Residual and relative errors 
norm(A*Uall-Uall*Diagonal(λall)), norm((λeigen-λall)./λeigen)


Out[31]:
(6.168066657865469e-13, 3.7165561229242204e-13)

In [32]:
methods(eigs);

Operator version

We can use Lanczos method with operator which, given vector x, returns the product A*x. We use the function LinearMap() from the package LinearMaps.jl


In [33]:
# Need Pkg.add("LinearMaps"); Pkg.checkout("LinearMaps")
using LinearMaps

In [35]:
methods(LinearMap);

In [36]:
# Operator from the matrix
C=LinearMap(A)


Out[36]:
LinearMaps.WrappedMap{Float64,Array{Float64,2}}([0.3454433335084457 0.22931928037283833 … 0.9054195638847085 0.6421308739214497; 0.22931928037283833 0.26482397905680677 … 0.514178051224931 0.32183186621992177; … ; 0.9054195638847085 0.514178051224931 … 0.7036007158251258 0.47218710079252535; 0.6421308739214497 0.32183186621992177 … 0.47218710079252535 0.4726055260985629], true, true, false)

In [37]:
λC,UC=eigs(C; nev=k, which=:LM, ritzvec=true, v0=x)
λeigs-λC


Out[37]:
10-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

Here is an example of LinearMap() with the function.


In [38]:
f(x)=A*x


Out[38]:
f (generic function with 1 method)

In [39]:
D=LinearMap(f,n,issymmetric=true)


Out[39]:
LinearMaps.FunctionMap{Float64}(f, 100, 100; ismutating=false, issymmetric=true, ishermitian=true, isposdef=false)

In [40]:
λD,UD=eigs(D, nev=k, which=:LM, ritzvec=true, v0=x)
λeigs-λD


Out[40]:
10-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0
 0.0

Sparse matrices


In [41]:
using SparseArrays

In [42]:
?sprand


search: sprand sprandn StepRange StepRangeLen

Out[42]:
sprand([rng],[type],m,[n],p::AbstractFloat,[rfn])

Create a random length m sparse vector or m by n sparse matrix, in which the probability of any element being nonzero is independently given by p (and hence the mean density of nonzeros is also exactly p). Nonzero values are sampled from the distribution specified by rfn and have the type type. The uniform distribution is used in case rfn is not specified. The optional rng argument specifies a random number generator, see Random Numbers.

Examples

```jldoctest; setup = :(using Random; Random.seed!(1234)) julia> sprand(Bool, 2, 2, 0.5) 2×2 SparseMatrixCSC{Bool,Int64} with 1 stored entry: [2, 2] = 1

julia> sprand(Float64, 3, 0.75) 3-element SparseVector{Float64,Int64} with 1 stored entry: [3] = 0.298614 ```


In [43]:
# Generate a sparse symmetric matrix
C=sprand(n,n,0.05) |> t -> t+t'


Out[43]:
100×100 SparseMatrixCSC{Float64,Int64} with 959 stored entries:
  [9 ,   1]  =  0.075135
  [34,   1]  =  0.0892932
  [38,   1]  =  0.0856666
  [50,   1]  =  1.18956
  [64,   1]  =  0.80064
  [73,   1]  =  0.0585849
  [81,   1]  =  0.90935
  [93,   1]  =  0.69035
  [23,   2]  =  0.845193
  [33,   2]  =  0.116939
  [50,   2]  =  0.0421612
  [54,   2]  =  0.916119
  ⋮
  [19,  99]  =  0.938328
  [44,  99]  =  0.0311694
  [53,  99]  =  0.222692
  [55,  99]  =  0.147782
  [71,  99]  =  0.143101
  [74,  99]  =  0.751704
  [85,  99]  =  0.615809
  [19, 100]  =  0.799465
  [28, 100]  =  0.99927
  [40, 100]  =  0.298608
  [62, 100]  =  0.0882193
  [68, 100]  =  0.393678
  [81, 100]  =  0.779374

In [44]:
issymmetric(C)


Out[44]:
true

In [49]:
λ,U=eigs(C; nev=k, which=:LM, ritzvec=true, v0=x)
λ


Out[49]:
10-element Array{Float64,1}:
  5.552979231252339
 -3.5246020879568674
  3.458433097302157
  3.3666778746015513
 -3.36312749980201
 -3.226860281237621
  3.143588007491281
 -3.085299805736902
  2.9515943352563236
  2.9115278456739087

In [ ]: